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1  Introduction 


Consider  the  Selection  problem,  which  we  denote  (S):  given  an  input  vector  X  = 

,  Xn-i)  and  an  input  y,  where  all  inputs  are  integers,  find  the  index  i  such 
that  Xi  <  y  <  Xi+i.  (By  definition  Xq  =  — oo  and  =  oo).  Problem  (S)  is  just  the 
problem  of  searching  in  a  sorted  table  of  integers. 

Snir  [6]  considered  this  problem  in  the  context  of  parallel  computation  in  two 
different  PRAM  models.  A  PRAM  consists  of  a  set  of  processors  Po,Pi,...  which 
communicate  by  means  of  cells  Mq,  Mi, . . .  of  shared  memory.  One  step  of  compu¬ 
tation  consists  of  three  phases.  In  the  read  phase,  each  processor  may  choose  one 
cell  to  read  from.  In  the  compute  phase,  an  arbitrary  amount  of  local  computation 
can  take  place.  In  the  write  phase,  each  processor  may  choose  one  cell  to  write  into. 
The  models  that  Snir  considered  differ  in  the  degree  of  simultaneous  access  to  shared 
memory  that  is  allowed.  In  the  EREW  PRAM,  no  two  processors  may  simultaneously 
read  or  write  into  the  same  cell.  In  the  CREW  PRAM,  simultaneous  read  access  is 
permitted,  but  not  simultaneous  write  access. 

It  is  easy  to  see  that  the  complexity  of  problem  (S)  on  a  CREW  PRAM  is  0(1), 
and  Snir  [6]  proved  an  ©(^ffogn)  upper  and  lower  bound  on  solving  the  problem 
in  the  EREW  model.  His  proof  proceeds  by  using  Ramsey’s  Theorem  to  restrict 
the  set  of  inputs  so  that  the  behaviour  of  an  algorithm  solving  the  problem  depends 
only  on  the  relative  order  of  the  input  values.  Essentially,  processors  may  only  make 
comparisons  and  gather  input  values,  and  an  information-theoretic  argument  shows 
that  this  cannot  be  done  quickly.  The  use  of  Ramsey’s  Theorem  means  that  the  lower 
bound  holds  only  if  the  input  numbers  are  drawn  from  a  large  enough  range. 

A  more  serious  problem  with  this  lower  bound  proof  than  the  size  of  the  range 
needed  is  that  the  problem  is  only  defined  on  a  restricted  set  of  inputs  (termed  a 
cleft  domain  in  [4]).  The  problem  of  testing  whether  the  input  is  valid  (that  is,  the 
x’s  are  sorted)  recjuires  fi(logn)  time  in  the  CREW  model.  (This  follows  from  the 
lower  bound  of  [1]  on  the  computation  of  the  OR  of  n  bits).  It  could  be  argued  that 
knowing  that  the  input  is  of  a  special  form  gives  information  to  the  CREW  PRAM 
that  the  EREW  PRAM  cannot  use,  and  thus  the  comparison  is  “unfair”;  Examples 
have  been  given  of  PRAM  models  which  can  be  separated  by  the  use  of  functions 
defined  on  partial  domains,  but  rvhich  are  equal  or  incomparable  when  considering 
functions  on  full  domains  ([2],  [3]). 
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In  the  next  section  we  show  how  the  Selection  problem  (S)  can  be  reformulated 
as  a  Decision  Tree  problem,  such  that  the  output  is  well  defined  for  any  input. 

2  Generalization  of  the  Selection  Problem  to  a 
Decision  Tree  Problem 

Let  r  be  a  complete  rooted  binary  tree  of  size  n  such  that  n  =  2^  —  1  where  h 
is  an  integer.  An  input  variable  is  associated  with  each  node  of  T.  The  variable 
Xni2  is  associated  with  the  root,  x„/4  and  x^n/i  with  the  left  and  right  child  of  the 
root  respectively,  and  so  on.  More  precisely,  if  a  node  has  a:,  associated  with  it,  and 
i  =  {2k  +  1)2*’,  then  the  left  child  of  the  node  has  Xj  associated  with  it,  and  the  right 
child  has  Xk  associated  with  it,  where  j  =  {2k  —  i)2*’  and  k  =  {2k  +  |)2*’.  We  number 
the  nodes,  giving  a  node  the  same  index  as  the  variable  associated  with  it. 

We  now  state  the  Decision  Tree  problem,  denoted  problem  (D):  A  path  from  each 
node  to  one  of  the  leaves  is  defined  inductively.  The  successor  of  internal  node  i  is  the 
left  child  of  i  if  ?/  <  Xi  and  the  right  child  of  i  if  y  >  Xi.  There  is  a  uniciue  root-leaf 
path  terminating  at  some  leaf  j.  The  output  of  the  problem  is  j  —  I  if  y  <  Xj  and  j 
if  y  >  Xi. 

Theorem  2.1  Problem  (S)  can  be  solved  in  0(loglogn)  time  in  the  CREW  model. 

Proof:  Problem  (S)  is  solved  in  the  CREW  model  by  using  the  “path  doubling” 
technique.  A  processor  P,  is  associated  with  each  node  i  in  the  tree.  Pi  reads  y 
and  X,-,  thereby  determining  the  successor  of  node  i.  This  information  is  stored  in 
memory,  say  in  location  i  of  array  S.  For  a  leaf  j,  let  S{j)  =  j.  Then,  in  parallel, 
each  processor  Pi  executes  the  instruction  S{i)  S{S{i)),  a  total  of  loglogTrtimes. 
After  this  is  done,  5'(n/2)  =  j  means  that  node  j  is  the  leaf  at  the  end  of  the  path 
from  the  root.  In  0(1)  steps  the  answer  can  be  computed. 

To  see  that  a  CREW  PRAM  requires  fi(loglog?7.)  time  to  solve  problem  (D),  we 
invoke  a  result  of  Simon  [5],  which  states  that  any  nondegenerate  Boolean  function  on 
n  variables  requires  ^(loglog??)  steps  to  compute  on  CREW.  Our  problem  does  not 
define  a  Boolean  function,  since  inputs  are  tuples  of  integers,  but  we  can  construct  a 
Boolean  function  g  by  letting  ?/  =  1,  restricting  .'Ci,a;2, .  • .  .t„-i  to  have  value  0-or  1, 
and  defining  the  outiaut  of  g  to  be  /  (mod  2).  The  resulting  g  is  at  least  as  easy  to 
compute  as  /,  and  is  a  nondegenerate  Boolean  function  of  n  —  1  variables. 
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Theorem  2.2  Problem  (D)  requires  0(-\/log  n)  time  to  solve  in  the  EREW  model. 


Proof:  The  purpose  of  demonstrating  an  0(\/log  n)  algorithm  is  to  show  that  the 
lower  bound  of  Snir  is  the  best  possible,  as  the  lower  bound  model  does  not  charge 
for  local  computation.  As  before,  a  processor  Pi  is  associated  with  node  i.  In  the  first 
step  of  the  algorithm,  Pi  reads  x,  and  stores  this  value  in  node  i.  We  note  that  in 
0(1)  steps  a  processor  at  a  node  can  read  any  information  stored  in  its  left  and  right 
children  and  coalesce  this  information  along  with  any  information  it  has.  Thus,  in 
0(-v/logn)  steps,  a  node  v  that  is  at  level  k^ogn  +  1  for  some  integer  k  can  gather 
the  values  of  all  variables  associated  with  nodes  in  the  subtree  of  height  -y/logn  below 
node  V.  Knowing  these  values  and  the  value  of  j/,  a  processor  can  determine  in  one 
step  the  node  that  is  y'logn  levels  below  v  on  the  path  from  v.  In  effect,  the  binary 
tree  has  been  compressed  so  that  it  is  now  a  tree  of  height  -y/logn  and  fanout 
The  naive  sequential  algorithm  to  find  the  bottom  of  the  root-leaf  path  can  now  be 
run,  taking  0{^og  n)  steps. 


To  prove  the  lower  bound,  we  show  that  problem  (S)  is  reducible  to  problem  (D) 
in  time  0(1).  In  fact,  problem  (S)  is  just  problem  (D)  restricted  to  inputs  in  which 
the  .t’s  are  sorted.  The  root-leaf  path  defined  by  problem  (D)  is  just  the  sequence  of 
variables  that  would  be  queried  by  binary  search. 

Another  way  of  stating  the  separation  implied  by  the  previous  two  theorems  is 
that  for  each  integer  T,  there  exists  a  problem  which  can  be  solved  in  T  steps  in  the 
CREW  model,  but  which  requires  2^^^^  steps  on  the  EREW  model. 


3  Separations  on  Boolean  input  and  output 

These  results  can  be  extended  slightly  to  show  a  lower  bound  for  a  problem  with 
integer  input  but  Boolean  output.  The  problem  is  just  problem  (D),  but  the  output 
is  taken  to  be  the  output  of  problem  (D)  mod  2.  To  see  that  Snir’s  lower  bound 
applies  to  this  problem,  one  must  examine  Snir’s  proof.  He  shows  that  if  o(-\/logn) 
steps  are  used  by  some  algorithm,  there  exist  two  inputs  in  the  restricted  domain 
and  an  integer  i  such  that  the  outputs  of  problem  (S)  on  those  two  inputs  are  i  and 
i  -f-  1  respectively,  and  the  computation  of  the  EREW  PRAM  on  the  two  inputs  is 
identical.  For  two  such  inputs,  the  output  of  problem  (D)  mod  2  would  also  differ, 
and  the  lower  bound  follows. 
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In  [6],  Snir  gives  a  lower  bound  for  a  problem  with  Boolean  input  and  output.  The 
problem  is  to  identify  the  switching  index  when  the  input  is  a  string  of  O’s  followed 
by  a  string  of  I’s.  A  lower  bound  of  fI(log(n/p))  time  in  the  EREW  model  is  proven, 
where  p  is  the  number  of  processors.  The  problem  can  be  solved  in  0{\ognllogp) 
time  in  the  CREW  model. 

In  the  same  vein  as  in  the  previous  section,  it  is  ecisy  to  see  that  if  we  modify 
problem  (D)  to  restrict  the  inputs  to  being  Boolean,  and  further  fix  ?/  =  1,  then 
Snir’s  problem  is  just  the  modified  problem  defined  on  a  restricted  set  of  inputs. 
Thus  the  modified  problem  (D)  takes  time  at  least  f](log(n/p))  time  in  the  EREW 
model.  Problem  (D)  can  be  solved  in  time  0{{logn/ logp)log\ogp)  in  the  CREW 
model.  The  p  processors  are  assigned  to  nodes  in  the  first  logp  levels  of  the  tree  and 
in  C>(loglogp)  steps  can  find  out  which  node  at  the  lowest  level  is  reached  by  the 
root-leaf  path.  This  procedure  is  then  repeated  log  n/ log  p  times  until  the  bottom  of 
the  tree  is  reached. 
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